Association-Based Bilingual Word Alignment

نویسنده

  • Robert C. Moore
چکیده

Bilingual word alignment forms the foundation of current work on statistical machine translation. Standard wordalignment methods involve the use of probabilistic generative models that are complex to implement and slow to train. In this paper we show that it is possible to approach the alignment accuracy of the standard models using algorithms that are much faster, and in some ways simpler, based on basic word-association statistics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Alignment for Languages with Scarce Resources Using Bilingual Corpora of Other Language Pairs

This paper proposes an approach to improve word alignment for languages with scarce resources using bilingual corpora of other language pairs. To perform word alignment between languages L1 and L2, we introduce a third language L3. Although only small amounts of bilingual data are available for the desired language pair L1-L2, large-scale bilingual corpora in L1-L3 and L2-L3 are available. Base...

متن کامل

Preparatory Work on Automatic Extraction of Bilingual Multi-Word Units from Parallel Corpora

Automatic extraction of bilingual Multi-Word Units is an important subject of research in the automatic bilingual corpus alignment field. There are many cases of single source words corresponding to target multi-word units. This paper presents an algorithm for the automatic alignment of single source words and target multi-word units from a sentence-aligned parallel spoken language corpus. On t...

متن کامل

Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond

Bilingual word alignment forms the foun-dation of current work on statisticalmachine translation. Standard word-alignment methods involve the use ofprobabilistic generative models that arecomplex to implement and slow to train.In this paper we show that it is possibleto approach the alignment accuracy of thestandard models using algorithms that aremuch faster...

متن کامل

Word Alignment Based on Bilingual Bracketing

In this paper, an improved word alignment based on bilingual bracketing is described. The explored approaches include using Model-1 conditional probability, a boosting strategy for lexicon probabilities based on importance sampling, applying Parts of Speech to discriminate English words and incorporating information of English base noun phrase. The results of the shared task on French-English, ...

متن کامل

Bilingual Unknown Word Alignment Tool for English-Thai

This paper presents a bilingual, English and Thai, unknown word alignment tools by using techniques, which are based on global and local characteristics of each word in parallel texts. Distribution and location of words in texts are analyzed generating candidate Thai unknown words with respect to each of English unknown word. Overall accuracy of the unknown word alignment is 90.32% on 6,000 bil...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005